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MOLECULAR DIFFERENCES BETWEEN SPECIES OF THE 
M. TUBERCULOSIS COMPLEX 

Tuberculosis is an ancient human scourge that continues to be an important 
5 public health problem worldwide. It is an ongoing epidemic of staggering proportions. 
Approximately one in every three people in the world is infected with Mycobacterium 
tuberculosis, and has a 10% lifetime risk of progressing from infection to clinical disease. 
Although tuberculosis can be treated, an estimated 2.9 million people died from the 
disease last year. 

10 There are significant problems with a reliance on drug treatment to control active 

M. tuberculosis infections. Most of the regions having high infection rates are less 
developed countries, which suffer from a lack of easily accessible health services, 
diagnostic facilities and suitable antibiotics against M. tuberculosis. Even where these 
are available, patient compliance is often poor because of the lengthy regimen required 

15 for complete treatment, and multidrug-resistant strains are increasingly common. 

Prevention of infection would circumvent the problems of treatment, and so 
vaccination against tuberculosis is widely performed in endemic regions. Around 100 
million people a year are vaccinated with live bacillus Calmette-Guerin (BCG) vaccine. 
BCG has the great advantage of being inexpensive and easily administered under less 

20 than optimal circumstances, with few adverse reactions. Unfortunately, the vaccine is 
widely variable in its efficacy, providing anywhere from 0 to 80% protection against 
infection with M. tuberculosis. 

BCG has an interesting history. It is an attenuated strain of M. bovis, a very close 
relative of M. tuberculosis. The M. bovis strain that became BCG was isolated from a 

25 cow in the late 1800's by a bacteriologist named Nocard, hence it was called Nocard/s 
bacillus. The attenuation of Nocard's bacillus took place from 1908 to 1921, over the 
course of 230 in vitro passages. Thereafter, it was widely grown throughout the world, 
resulting in additional hundreds and sometime thousands of in vitro passages. 
Throughout its many years in the laboratory, there has been selection for cross-reaction 

30 with the tuberculin skin test, and for decreased side effects. The net results has been a 
substantially weakened pathogen, which may be ineffective in raising an adequate 
immune response. 

New antituberculosis vaccines are urgently needed for the general population in 
endemic regions, for HIV-infected individuals, as well as health care professionals likely 
35 to be exposed to tubercle bacilli. Recombinant DNA vaccines bearing protective genes 
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from virulent M. tuberculosis are being developed using shuttle phasmids to transfer 
genefc material from one mycobacterial species to another, for example see U.S. Patent 
no. 5,776,465. Tuberculosis vaccine development should be given a high priority in 
current medical research goals. 

Relevant literature 

Mahairas et al. (1996) JBacterigJ 178(5): 1274-1 282 provides a molecular analysis" 
of genetic differences between Mycobacterium bovis BCG and virulent M. bovis 
Subtracts genomic hybridization was used to identify genetic differences between 
virulept M bovis. and ;M. tuberculosis and ayirulent BCG, U.S. Patent no. 5,700,683 is 
directed to these genetic differences. 

Cole et al. (1998) Nature 393:537-544 have described the complete genome of M 
tuberculosis. To obtain the contiguous genome sequence, a combined approach was 
used that involved the systematic sequence analysis of selected large-insert clones as 
well as random small-insert clones from a whole-genome shotgun library. This 
culminated in a composite sequence of 4.411.529 base pairs, with a G + C content of 
65.6%. 3,924 open reading frames were identified in the genome, accounting for -91% 
of the potential coding capacity. 

Mycobacterium tuberculosis (M.tb.) genomic sequence is available at several 
internet sites. including h ttpV/www.cric.com/htdo^/tMhArculosis/inH^v ht ml and 
http://ww w. sanaer ac.uk/pathog ftn . 

Summary of the Invention 
Genetic markers are provided that distinguish between strains of the 
Mycobacterium tuberculosis complex, particularly between avirulent and virulent strains. 
Strains of interest include M. bovis, M. bovis BCG strains, M. tuberculosis (M. tb.) 
isolates, and bacteriophages that infect mycobacteria. The genetic markers are used for 
assays, e.g. immunoassays, that distinguish between strains, such as to differentiate 
between BCG immunization and M. tb. infection. The protein products may be produced 
and used as an immunogen, in drug screening, etc. The markers are useful in 
constructing genetically modified M. tb or M. bovis cells having improved vaccine 
characteristics. 
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Detailed Description of the Embodiments 
Specific genetic deletions are identified that serve as markers to distinguish 
between aviruient and virulent mycobacteria strains, including M. bovis, M. bovis BCG 
strains, M. tuberculosis (M. tb.) isolates, and bacteriophages that infect mycobacteria. 

5 These deletions are used as genetic markers to distinguish between the different 
mycobacteria. The deletions may be introduced into M. tb. or M. bovis by recombinant 
methods in order to render a pathogenic strain aviruient. Alternatively, the deleted genes 
are identified in the M. tb. genome sequence, and are then reintroduced by recombinant 
methods into BCG or other vaccine strains, in order to improve the efficacy of 

10, vaccination. 

The deletions of the invention are identified by comparative DNA hybridizations 
from genomic sequence of mycobacterium to a DNA microarray comprising 
representative sequences of the M. tb. coding sequences. The deletions are then 
mapped to the known M. tb. genome sequence in order to specifically identify the deleted 

15 gene(s), and to characterize nucleotide sequence of the deleted region. 

Nucleic acids comprising the provided deletions and junctions are used, in a 
variety of applications. Hybridization probes may be obtained from the known M. tb. 
sequence which correspond to the deleted sequences. Such probes are useful in 
distinguishing between mycobacteria. For example/there is a 10% probability that an M. 

20 tb. infected person will progress to clinical disease, but that probability may vary 
depending of the particular infecting strain. Analysis for the presence or absence of the 
deletions provided below as "M. tb variable" is used to distinguish between different M. tb 
strains. The deletions are also useful in identifying whether a patient that is positive for a 
tuberculin skin test has been infected with M. tb or with BCG. 

25 In another embodiment of the invention, mycobacteria are genetically altered to 

delete sequences identified herein as absent in attenuated strains, but present in 
pathogenic strains, e.g. deletions found in BCG but present in M. tb H37Rv. Such 
genetically engineered strains may provide superior vaccines to the present BCG isolates 
in use. Alternatively, BCG strains may be "reconstructed" to more closely resemble wild- 

30 type M. tb by inserted certain of the deleted sequences back into the genome. Since the 
protein products of the deleted sequences are expressed in virulent mycobacterial 
species, the encoded proteins are useful as immunogens for vaccination. 

The attenuation (loss of virulence) in BCG is attributed to the loss of genetic 
material at a number of places throughout the genome. The selection over time for fewer 

35 side-effects resulting from BCG immunization, while retaining cross-reactivity with the 

3 
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tuberculin skin test, has provided an exce.lent screen for those sequences that engender 
s.de effects. The identification of deletions that vary between BCG isolates identifies 
such sequences, which may be used in drug screening and biological analysis for the 
role of the deleted genes in causing untoward side effects and pathogenicity. 

5 

Identification of M Tuberculosis Compl y Deletion Marto 
The present invention provides nucleic acid sequences that are markers for 
specific mycobacteria, including M. tb., M. bovis, BCG and bacteriophage. The deletions 
are listed in Table 1. The absence or presence of these marker sequences are 

10 Oharacteristic of the indicated isolate, or strain. As such, they provide a unique 
characteristic for the identification of the indicated mycobacteria. The deletions are 
.dentified by their M. tb. open reading frame (W nomenclature), which corresponds to a 
known genetic sequence, and may be accessed as previously cited. The Junctions of the 
delefons are provided by the designation of position in the publicly available M tb 

15 sequence. 
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The W column indicates public M. tb sequence, open reading frame. The BGG strains 
were obtained as follows: 



Hamenf 5r r a -l nS 6mpl °y ed in stud V of BCG phylogeny 



Name of strain 
BCG-Russia 
BCG-Moreau 
BCG-Moreau 
BCG-Moreau 
BCG-Japan 
BCG-Japan 
BCG-Japan 
BCG-Japan 
BCG-Japan 
BCG-Sweden 
BCG-Sweden 
BCG-Sweden 
BCG-Phipps 
BCG-Denmark 
BCG-Copenhagen 
BCG-Copenhagen 
BCG-Tice 
BCG-Tice 
BCG-Frappier 

BCG-Frappier, INH-resistant 
BCG-Frappier 
BCG-Connaught 
BCG-Birkhaug 
BCG-Prague 
BCG-Glaxo 
BCG-Glaxo 
BCG-Pasteur 
BCG-Pasteur 
BCG-Pasteur 
BCG-Pasteur 
BCG-Pasteur 
BCG-Pasteur 
BCG-Pasteur 



Synonym 
Moscow 
Brazil 
Brazil 
Brazil 
Tokyo 
Tokyo 
Tokyo 
Tokyo 
Tokyo 
Gothenburg 
Gothenburg 
Gothenburg 
Philadelphia 
Danish 1331 



Chicago 
Chicago 
Montreal 
Montreal- 
Montreal 
Toronto 



Czech 



Source 
ATCC 
ATCC 
IAF 
IAF 
ATCC 
IAF 
JATA 
JATA 
JATA 
ATCC 
IAF 
SSI 
ATCC 
ATCC 
ATCC 
IAF 

Vaccine 

ATCC 

IAF 

IAF 

IAF 

CL 

ATCC 
SSI 

Vaccine 

ATCC 

IAF 

IAF 

IP 

IP 

IP 

IP 

ATCC 



Descriptors 

# 35740 

# 35736 
dated 1958 
dated 1961 

# 35737 
dated 1961 
vaccine strain 
bladder cancer strain 
clinical isolate- adenitis 
#35732 

dated 1958 

production lot, Copenhagen 

# 35744 

# 35733 
#27290 
dated 1961 
dated 1973 

# 35743 

primary lot, 1973 
primary lot, 1973 
passage 946 
bladder cancer treatment 

# 35731 

lyophilized 1968 
dated 1973 

# 35741 
passage 888 
dated 1961 
1173P2-B 
1173P2-C 
clinical isolate # 1 
clinical isolate # 2 

# 35734 



Copenhagen, Denmark CL-rnnn?.?^ ? k R ° ckv,lle ' A Md ' USA ; SSI=Statens Serum Institute. 
Tubercutosis ^ Canada. JATA= Japanese An* 

Toronto, the .atter beinc ? derived fZ^oa^r ^ ***** l ° BCG - Mo ^a. *"d BCG- 
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In performing the initial screening method, genomic DNA is isolated from two 
mycobacteria microbial cell cultures. The two DNA preparations are labeled, where a 
different label is used for the first and second microbial cultures, typically using 
nucleotides conjugated to a fluorochrome that emits at a wavelength substantially 
different from that of the fluorochrome tagged nucleotides used to label the selected 
probe. The strains used were the reference strain of Mycobacterium tuberculosis 
(H37Rv), other M: to. laboratory strains, such as H37Ra, the O strain, M. tb. clinical 
isolates, the reference strain of Mycobacterium bovis, and different strains of 
Mycobacterium bovis BCG. 

The two DNA preparations ar^ mixed, and competitive hybridization is carried, out : 
to a microairray representing all of the open reading frames in the genome of the test 
microbe, usually H37Rv. Hybridization of the labeled sequences is accomplished 
according to methods well known in the art. In a preferred embodiment, the two probes 
are combined to provide for a competitive hybridization to a single micrbarray. 
Hybridization can be carried out under conditions varying in stringency, preferably under . 
conditions of high stringency (e.g., 4x SSC, 10% SDS, 65 'C) to allow for hybridization of 
complementary sequences having extensive homology (e.g., having at least 85% 
sequence identity, preferably at least 90% sequence identity, more preferably having at 
least 95% sequence identity). Where the target sequences are native sequences the 
hybridization is preferably carried out under conditions that allow hybridization of only 
highly homologous sequences (e.g., at least 95% to 100% sequence identity). 

Two color fluorescent hybridization is utilized to, assay the representation of the 
unselected library in relation to the selected library (i.e., to detect hybridization of the 
unselected probe relative to the selected probe). From the ratio of one color to the other, 
for any particular array element, the relative abundance of that sequence in the 
unselected and selected library can be determined. In addition, comparison of the 
hybridization of the selected and unselected probes provide an internal control for the 
assay. An absence of signal from the reference strain, as compared to H37Rv, is 
indicative that the open reading frame is deleted in the test strain. The deletion may be 
further mapped by Southern blot analysis, and by sequencing the regions flanking the 
deletion. 

Microarrays can be scanned to detect hybridization of the selected and the 
unselected sequences using a custom built scanning laser microscope as described in 
Shalon et a/., Genome Res. 6:639 (1996). A separate scan, using the appropriate 
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excitation line, is performed for each of the two fluorophores used. The digital images 
generated from the scan are then combined for subsequent analysis. For any particular 
array element, the ratio of the fluorescent signal from the amplified selected cell 
population DNA associated is compared to the fluorescent signal from the unselected cell 
population DNA, and the relative abundance of that sequence in the selected and 
selected library determined. 

Nucleic Acid Comoositinng 

As used herein, the term "deletion marker", or "marker" is used to refer to those 
sequences of M. tuberculosis, complex genomes that are deleted in one or more of the . 
strains or species, as indicated in Table 1 ? The bacteria of the M. tuberculosis complex 
include M. tuberculosis, M. bovis, and BCG. inclusive of varied isolates and strains within 
each species. Nucleic acids of interest include all or a portion of the deleted region, 
particularly complete open reading frames, hybridization primers, promoter regions, etc. 

The term "junction" or "deletion junction" is used to refer to nucleic acids that 
comprise the regions on both the 3' and the 5' sequence immediately flanking the 
deletion. Such junction sequences are preferably used as short primers, e.g. from about 
15 nt to about 30 nt, that specifically hybridize to the junction, but not to a nucleic acid 
comprising the undeleted genomic sequence. For example, the deletion found in M. 
bovis, at Rv0221, corresponds to the nucleotide sequence of the M. tuberculosis H37Rv 
genome, segment 12: 17432.19335. The junction comprises the regions upstream of 
position 17342, and downstream of 19335, e.g. a nucleic acid of 20 nucleotides 
comprising the sequence from H37Rv 17332-17342 joined to 19335-19345. 

Typically, such nucleic acids comprising a junction will include at least about 7 
nucleotides from each flanking region, i.e. from the 3' and from the 5' sequence adjacent 
to the deletion, and may be about 10 nucleotides from each flanking region, up to about 
15 nucleotides, or more. Amplification primers that hybridize to the junction sequence, to 
the deleted sequence, and to the flanking non-deleted regions have a variety of uses. as 
detailed below. 

The nucleic acid compositions of the subject invention encode all or a part of 
the deletion markers. Fragments may be obtained of the DNA sequence by chemically 
synthesizing oligonucleotides in accordance with conventional methods, by restriction 
enzyme digestion, by PCR amplification, etc For the most part. DNA fragments will be at 
least about 25 nt in length, usually at least about 30 nt. more usually at least about 50 nt. 
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For use in amplification reactions, such as PCR, a pair of primers will be used. The exact 
composition of the primer sequences is not critical to the invention, but for most 
applications the primers will hybridize to the subject sequence under stringent conditions, 
as known in the art. It is preferable to chose a pair of primers that will generate an 
amplification product of at least about 50 nt, preferably at least about 100 nt. Algorithms 
for the selection of primer sequences are generally known, and are available in 
commercial software packages. Amplification primers hybridize to complementary 
strands of DNA, and will prime towards each other 

Usually, the DNA will be obtained substantially free of other nucleic acid 
sequences that do not include a deletion marker sequence or fragment thereof, generally 
being at least about 50%, usually at least about 90% pure and are typically 
"recombinant", i.e. flanked by one or more nucleotides with which it is not normally 
associated on a naturally occurring chromosome. 

For screening purposes, hybridization probes of one or more of the deletion 
sequences may be used in separate reactions or spatially separated on a solid phase 
matrix, or labeled such that they can be distinguished from each other. Assays may 
utilize nucleic acids that hybridize to one or more of the described deletions. 

An array may include all or a subset of the deletion markers listed in Table 1. 
Usually such an array will include at least 2 different deletion marker sequences, i.e. 
deletions located at unique positions within the locus, and may include all of the provided 
deletion markers. Arrays of interest may further comprise other genetic sequences, 
particularly other sequences of interest for tuberculosis screening. The oligonucleotide 
sequence on the array will usually be at least about 12 nt in length, may be the length of 
the provided deletion marker sequences, or may extend into the flanking regions to 
generate fragments of 100 to 200 nt in length. For examples of arrays, see Ramsay 
(1998) Nat. Biotech . 16:40-44; Hacia et al. (1996) Nature Genetics 14:441-447; Lockhart 
et al. (1996) Nature Biotechnol . 14:1675-1680; and De Risi et al. (1996) Nature Genetics 
14:457-460. 

Nucleic acids may be naturally occurring, e.g. DNA or RNA, or may be synthetic 
analogs, as known in the art. Such analogs may be preferred for use as probes because 
of superior stability under assay conditions. Modifications in the native structure, 
including alterations in the backbone, sugars or heterocyclic bases, have been shown to 
increase intracellular stability and binding affinity. Among useful changes in the 
backbone chemistry are phosphorothioates; phosphorodithioates, where both of the 
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non-bndg,ng oxygens are substituted with sulfur; phosphoroamidites; alkyl 
phosphotriesters and boranophosphates. Achiral phosphate derivatives include 
3'-0'-5'-S-phosphorothioate, 3'-S-5'-0- phosphorothioate. 3'-CH2-5'-0-phosphonate and 
3'-NH-5'-0-phosphproamidate. Peptide nucleic acids replace the entire ribose 
phosphodiester backbone with a peptide linkage. 

Sugar modifications are also used to enhance stability and affinity. The 
□-anomer of deoxyribose may be used, where the base is inverted with respect to the " 
natural b-anomer. The 2'-OH of the ribose sugar may be altered to form Z-O- methyl or 
2'-0-allyl sugars, which provides resistance to degradation without comprising affinity. 

Modification of the heterocyclic bases must maintain proper base pairing;, Some ; 
useful substitutions include deoxyuridine for deoxythymidine; 5-methyl^- deoxycytidine 
and 5-bromo-2'-deoxycytidine for deoxycytidine. 5- propynyl-2'- deoxyuridine and 
5-propynyl-2--deoxycytidine have been shown to increase affinity and biological activity 
when substituted for deoxythymidine and deoxycytidine, respectively. 

Polypeptid e Com positions 
The specific deletion markers in Table 1 correspond to open reading frames of the 
M, tb genome, and therefore encode a polypeptide. The subject markers may be 
employed for synthesis of a complete protein, or polypeptide fragments thereof, 
particularly fragments corresponding to functional domains; binding sites; etc.; and 
including fusions of the subject polypeptides to other proteins or parts thereof. For 
expression, an expression cassette may be employed, providing for a transcriptional and 
translational initiation region, which may be inducible or constitutive, where the coding 
region is operably linked under the transcriptional control of the transcriptional initiation 
region, and a transcriptional and translational termination region. Various transcriptional 
initiation regions may be employed that are functional in the expression host. 

The polypeptides may be expressed in prokaryotes or eukaryotes in accordance 
with conventional ways, depending upon the purpose for expression. For large scale 
production of the protein, a unicellular organism, such as E. coli, B. subtilis, S. cerevisiae, 
or cells of a higher organism such as vertebrates, particularly mammals, e.g. COS 7 
cells, may be used as the expression host cells. Small peptides can also be synthesized 
in the laboratory. 

With the availability of the polypeptides in large amounts, by employing an 
expression host, the polypeptides may be isolated and purified in accordance with 
conventional ways. A lysate may be prepared of the expression host and the lysate 
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purified using HPLC, exclusion chromatography, gel electrophoresis, affinity 
chromatography, or other purification technique. The purified polypeptide will generally 
be at least about 80% pure, preferably at least about 90% pure, and may be up to and 
including 100% pure. Pure is intended to mean free of other proteins, as well as cellular 
5 debris. 

The polypeptide is used for the production of antibodies, where short fragments 
provide for antibodies specific for the particular polypeptide, and larger fragments or the 
entire protein allow for the production of antibodies over the surface of the polypeptide. 
Antibodies may be raised to isolated peptides corresponding to particular domains, or to 

10 the native protein. . / .-. ■;■ -:o-- ;: -\V 

Antibodies are prepared in accordance with conventional ways, where the 
expressed polypeptide or protein is used as an immunogen, by itself or conjugated to 
known immunogenic carriers, e.g. KLH, pre-S HBsAg, other viral or eukaryotic proteins, 
or the like. Various adjuvants may be employed, with a series of injections, as 

15 appropriate. For monoclonal antibodies, after one or more booster injections, the spleen 
is isolated, the lymphocytes immortalized by cell fusion, and then screened for high 
affinity antibody binding. The immortalized cells, i.e. hybridomas, producing the desired 
antibodies may then be expanded. For further description, see Monoclonal Antibodies: A 
Laboratory Manual . Harlow and Lane eds., Cold Spring Harbor Laboratories, Cold Spring 

20 Harbor, New York, 1988. If desired, the mRNA encoding the heavy and light chains may 
be isolated and mutagenized by cloning in E. co//, and the heavy and light chains mixed 
. to further enhance the affinity of the antibody. Alternatives to in vivo immunization as a 
method of raising antibodies include binding to phage "display" libraries, usually in 
conjunction with in vitro affinity maturation. 

25 The antibody may be produced as a single chain, instead of the normal multimeric 

structure. Single chain antibodies are described in Jost et a/. (1994Y J. B.C. 269:26267- 
73, and others. DNA sequences encoding the variable region of the heavy chain and the 
variable region of the light chain are ligated to a spacer encoding at least about 4 amino 
acids of small neutral amino acids, including glycine and/or serine. The protein encoded 

30 by this fusion allows assembly of a functional variable region that retains the specificity 
and affinity of the original antibody. 
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Use of Deletion Markers in |der3tifi c§tion of My ccfeactgria 
The deletions provided in Table 1 are useful for the identification of a 
mycobactenum as (a) variants of M. tb. (b) isolates of BCG (c) M. bovis strains or (d) 
carry,ng the identified mycobacteria, bacteriophage, depending on the specific marker 
that ... chosen. Such screening is particu.arly useful in determining whether a particular 
•nfechon or isolate is pathogenic. The term mycobacteria may refer to any member of the 
fam.yMycobacteriacaeae, including M. tuberculosis, M. avium complex, M. kansasii M 
scrofulaceum, M. bovis and M. leprae. 

thro kTT' ** emn * de ' ee0RS "* ^ in ** art ' Dete,ions -> •» identified 
through the Absence or presence of the sequences ,n mRNA cr genomic DNA. through 

anaiysts of juncbona, regions fhat flank fhe deletion, or detecbon of .he gene product or 
parhoularly retating ,o the ,uberou„n skin tesf, by idenufication o, antibodies tha, reaci 
with the encoded gene product. 

While deletions can be easily determined by the absence of hybridization, in many 
cases it is desirab.e to have a positive signa.. in order to minimize artifactua. negative 
readings. In such cases the deletions may be detected by designing a primers that 
flanks the junction formed by the deletion. Where the de.etion is present a novel 
sequence is formed between the flanking regions, which can be detected by 
hybridization, Preferab.y such a primer will be sufficiently short that it will only hybridize 
to the iunction, and will fai. to form stable hybrids with either of the separate parts of the 
junction. 

Diagnosis of is performed by protein, DNA or RNA sequence and/or hybridization 
analyse of any convenient sample, e.g. cultured mycobacteria, biopsy material, blood 
sample, etc. Screening may also be based on the functional or antigenic characteristics 
of the protein. Immunoassays designed to detect the encoded proteins from deleted 
sequences may be used in screening. 

A number of methods are available for analyzing nucleic acids for the presence of 
a specfic sequence. Where large amounts of DNA are available, genomic DNA is used 
d.rect«y. Alternatively, the region of interest is cloned into a suitable vector and grown in 
sufficient quantity for analysis. The nucleic acid may be amplified by conventional 
echmques, such as the polymerase chain reaction (PGR), to provide sufficient amounts 
for analysis. The use of the polymerase chain reaction is described in Saiki. etal. (1985) 
Saence 239:487, and a review of current techniques may be found in Sambrook, et a, 
MgL ecu.arq.oninnAl.hnrafon ^^nl, CSH Press 1989, pp. 14.2-14.33. Amplification 
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may also be used to determine whether a polymorphism is present, by using a primer 
that is specific for the polymorphism. Alternatively, various methods are known in the art 
that utilize oligonucleotide ligation, for examples see Riley et al. (1990) N.A.R. 18:2887- 
2890; and Delahunty et al. (1996) Am. J. Hum. Genet 58:1239-1246. 
5 A detectable label may be included in an amplification reaction. Suitable labels 

include fluorochromes, e.g. fluorescein isothiocyanate (FITC), rhodamine, Texas Red, 
phycoerythrin, allophycocyanin, 6-carboxyfluorescein (6-FAM), 2 , ,7 , -dimethoxy-4 , l 5- 
dichloro-6-carboxyfluorescein (JOE), 6-carboxy-X-rhodamine (ROX), 6-carboxy- 
2\4\7\4,7-hexachlorofluorescein (HEX), 5-carboxyfluorescein (5-FAM) or N.N.N'.N- 
10 , tetramethyl-6-carboxyrhod (TAM RA), radioactive labels, e.g. 32 F>, 3 *S, 3 H; etc. the 
iabei may be a two stage system, where the amplified DNA is conjugated to biotin, 
haptens, etc. having a high affinity binding partner, e.g. avidin, specific antibodies, etc., 
where the binding partner is conjugated to a detectable label. The label may be 
conjugated to one or both of the primers. Alternatively, the pool of nucleotides used in 
15 * the amplification is labeled, so as to incorporate the label into the amplification product. 

The sample nucleic acid, e.g. amplified or cloned fragment, is analyzed by one of 
a number of methods known in the art. The nucleic acid may be sequenced by dideoxy 
or other methods, and the sequence of bases compared to the deleted sequence. 
Hybridization with the variant sequence may also be used to determine its presence, by 
20 Southern blots, dot blots, etc. The hybridization pattern of a control and variant 
sequence to an array of oligonucleotide probes immobilized on a solid support, as 
described in US 5,445,934, or in WO95/35505, may also be used as a means of 
detecting the presence of variable sequences. Single strand conformational 
polymorphism (SSCP) analysis, denaturing gradient gel electrophoresis (DGGE), 
25 mismatch cleavage detection, and heteroduplex analysis in gel matrices are used to 
detect conformational changes created by DNA sequence variation as alterations in 
electrophoretic mobility. Alternatively, where a polymorphism creates or destroys a 
recognition site for a restriction endonuclease (restriction fragment length polymorphism, 
RFLP), the sample is digested with that endonuclease, and the products size fractionated 
30 to determine whether the fragment was digested. Fractionation is performed by gel or 
capillary electrophoresis, particularly acrylamide or agarose gels. 

The hybridization pattern of a control and variant sequence to an array of 
oligonucleotide probes immobilized on a solid support, as described in US 5,445,934, or 
in WO95/35505, may be used as a means of detecting the presence of deleted 
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sequences. , n one embodiment of the invention, an array of oligonucleotides are 
provded, where discrete positions on the array are complementary to at .east a portion of 
tb - 9en0miC DNK USUa,, y com P risi "g at least a portion from the identified open 
read.ng frames. Such an array may comprise a series of o.igonucleotides, each of which 
can specifics hybridize to a nucleic acid, e.g. mRNA, cDNA, genomic DNA etc 

Deletions may also be detected by amplification, In an embodiment of the 
.nvent,on, sequences are amplified that include a deletion junction, i.e. where the 
amplication primers hybridize to a junction sequence. In a nucleic acid sample where 
the marker sequence is deleted, a junction wiN be formed, and the primer will hybridize 
.thereby ailowing amplification of;a detectable sequence. ■: In a nucleic acid sample where" 
the marker sequence is present, the primer will not hybridize, and no amplification will 
take place. Alternatively, amplification primers may be chosen such that amplification of 
the target sequence will only take place where the marker sequence is present The 
amplication products may be separated by size using any convenient method, as known 
•n the art, including gel electrophoresis, chromatography, capillary electrophoresis 
density gradient fractionation, etc. 

In addition to the detection of deletions by the detection of junctions sequences 
or detect,on of the marker sequences themselves, one may determine the presence or 
absence of the encoded protein product. The specific deletions in Table 1 correspond to 
open reading frames of the M. tb genome, and therefore encode a polypeptide 
Polypeptides are detected by means known in the art. including determining the presence 
of the specific polypeptide in a sample through biochemical, functional or immunological 
characterization. The detection of antibodies in patient serum that react with a 
polypeptide is of particular interest. 

Immunization with BCG typically leads to a positive response against tuberculin 
anfigens in a skin test. In people who have been immunized, which includes a significant 
proportion of the world population, it is therefore difficult to determine whether a positive 
test ,s the result of an immune reaction to the BCG vaccine, or to an ongoing M tb 
.nfect.cn. The subject invention has provided a number of open reading frame 
sequences that are present in M. tb isolates, but are absent in BCG. As a primary or a 
secondary screening method, one may test for immunoreactivity of the patient with the 
polypeptides encoded by such deletion markers. Diagnosis may be performed by a 
number of methods. The different methods all determine the presence of an immune 
response to the polypeptide in a patient, where a positive response is indicative of an M. 
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tb infection. The immune response may be determined by determination of antibody 
binding, or by the presence of a response to intradermal challenge with the polypeptide. 

In one method, a dose of the deletion marker polypeptide, formulated as a 
cocktail of proteins or as individual protein species, in a suitable medium is injected 
5 subcutaneously into the patient. The dose of will usually be at least about 0.05 pg of 
protein, and usually not more than about 5 pg of protein. A control comprising medium 
alone, or an unrelated protein will be injected nearby at the same time. The site of 
injection is examined after a period of time for the presence of a wheal. The wheal at the 
site of polypeptide injection is compared to that at the site of the control injection, usually 
.10 by measuring the size of the wheal: The skin test readings may be assessed by a variety 
of objective grading systems. A positive result for the presence of an allergic condition 
will show an increased diameter at the site of polypeptide injection as compared to the 
control, usually at least about 50% increase in size, more usually at least 100% increase 
in size. 

15 * An alternative method for diagnosis depends on the in vitro detection of binding 

between antibodies in a patient sample and the subject polypeptides, either as a cocktail 
or as individual protein species, where the presence of specific binding is indicative of an 
■ y infection. Measuring the concentration of an polypeptide specific antibodies in a sample 
or fraction thereof may be accomplished by a variety of specific assays. In general, the 

20 assay will measure the reactivity between a patient sample, usually blood derived, 
generally in the form of plasma or serum. The patient sample may be used directly, or 
diluted as appropriate, usually about 1:10 and usually not more than about 1:10,000. 
Immunoassays may be performed in any physiological buffer, e.g. PBS, normal saline, 
HBSS, dPBS, eta 

25 In a preferred embodiment, a conventional sandwich type assay is used. A 

sandwich assay is performed by first attaching the polypeptide to an insoluble surface or 
support. The polypeptide may be bound to the surface by any convenient means, 
depending upon the nature of the surface, either directly or through specific antibodies. 
The particular manner of binding is not crucial so long as it is compatible with the 

30 reagents and overall methods of the invention. They may be bound to the plates 
covalently or non-covalently, preferably non-covalently. Samples, fractions or aliquots 
thereof are then added to separately assayable supports (for example, separate wells of 
a microtiter plate) containing support-bound polypeptide. Preferably, a series of 
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10 



standards, containing known concentrations of antibodies is assayed in parallel with the 
samples or aliquots thereof to serve as controls. 

Immune specific receptors may be labeled to facilitate direct, or indirect 
quantification of binding. Examples of labels which permit direct measurement of second 
receptor binding include radiolabels, such as 3H or ^s\ t fluoresces, dyes, beads, 
chemilumninescers, colloidal particles, and the like. Examples of labels which permit 
indirect measurement of binding include enzymes where the substrate may provide for a 
colored or fluorescent product. In a preferred embodiment, the second receptors are 
antibodies labeled with a covalently bound enzyme capable of providing a detectable 
product signal after addition of suitable substrate. Examples of suitable enzymes for use 
in conjugates include horseradish peroxidase, alkaline phosphatase, malate 
dehydrogenase and the like. Where not commercially available, such antibody-enzyme 
conjugates are readily produced by techniques known to those skilled in the art. 

In some cases, a competitive assay will be used. In addition to the patient 
15 sample, a competitor to the antibody is added to the reaction mix. The competitor and 
the antibody compete for binding to the polypeptide. Usually, the competitor molecule 
will be labeled and detected as previously described/where the amount of competitor 
binding will be proportional to the amount of Immune present. The concentration of 
competitor molecule will be from about 10 times the maximum anticipated Immune 
20 concentration to about equal concentration in order to make the most sensitive and linear 
range of detection. 

Alternatively, antibodies may be used for direct determination of the presence of 
the deletion marker polypeptide. Antibodies specific for the subject deletion markers as 
previously described may be used in screening immunoassays. Samples, as used 

25 herein, include microbial cultures, biological fluids such as tracheal lavage, blood, etc. 
Also included in the term are derivatives and fractions of such fluids. Diagnosis may be 
performed by a number of methods. The different methods all determine the absence or 
presence of polypeptides encoded by the subject deletion markers. For example, 
detection may utilize staining of mycobacterial cells or histological sections, performed in 

30 accordance with conventional methods. The antibodies of interest are added to the cell 
sample, and incubated for a period of time sufficient to allow binding to the epitope, 
usually at least about 10 minutes. The antibody may be labeled with radioisotopes, 
enzymes, fluorescers, chemiluminescers, or other labels for direct detection. 
Alternatively, a second stage antibody or reagent is used to amplify the signal. Such 
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reagents are well known in the art. For example, the primary antibody may be 
conjugated to biotin, with horseradish peroxidase-conjugated avidin added as a second 
stage reagent. Final detection uses a substrate that undergoes a color change in the 
presence of the peroxidase. The absence or presence of antibody binding may be 
5 determined by various methods, including microscopy, radiography, scintillation counting, 
etc. 

An alternative method for diagnosis depends on the in vitro detection of binding 
between antibodies and the subject polypeptides in solution, e.g. a cell lysate. Measuring 
the concentration of binding in a sample or fraction thereof may be accomplished by a 

10 variety of specific assays; A conventional sandwich type assay m ay be used. For 
example, a sandwich assay may first attach specific antibodies to an insoluble surface or 
support. The particular manner of binding is not crucial so long as it is compatible with 
the reagents and overall methods of the invention. They may be bound to the plates 
covalently or non-covalently, preferably noh-covalently. The insoluble supports may be. 

15 any compositions to which polypeptides can be bound, which is readily separated from 
soluble material, and which is otherwise compatible with the overall method. The surface 
of such supports may be solid or porous and of any convenient shape. Examples of 
suitable insoluble supports to which the receptor is bound include beads, e.g. magnetic 
beads, membranes and microtiter plates. These are typically made of glass, plastic (e.g. 

20 polystyrene), polysaccharides, nylon or nitrocellulose. Microtiter plates are especially 
. convenient because a large number of assays can be carried out simultaneously, using 
small amounts of reagents and samples. 

Samples are then added to separately assayable supports (for example, separate 
wells of a microtiter plate) containing antibodies. Preferably, a series of standards, 

25 containing known concentrations of the polypeptides is assayed in parallel with the 
samples or aliquots thereof to serve as controls. Preferably, each sample and standard 
will be added to multiple wells so that mean values can be obtained for each. The 
incubation time should be sufficient for binding, generally, from about 0.1 to 3 hr is 
sufficient. After incubation, the insoluble support is generally washed of non-bound 

30 components. Generally, a dilute non-ionic detergent medium at an appropriate pH, 
generally 7-8, is used as a wash medium. From one to six washes may be employed, 
with sufficient volume to thoroughly wash non-specifically bound proteins present in the 
sample. 
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After washing, a solution containing a second antibody is applied. The antibody 
wll bind with sufficient specificity such that it can be distinguished from other components 
present. The second antibodies may be labeled to facilitate direct, or indirect 
quantisation of binding. Examples of labels that permit direct measurement of second 
receptor bmding include radiolabels, such as 3 H or «|. fluorescers. dyes beads 
chemi.umninescers, colloidal particles, and the like. Examples of labels which permit 
■nd.rect measurement of binding include enzymes where the substrate may provide for a 
colored or fluorescent product. In a preferred embodiment, the antibodies are labeled 
w,th a covalently bound enzyme capable of providing a detectable product signal after 
add,t,on of suitable substrate. Examples of suitable enzymes for use in conjugates 
mclude horseradish peroxidase, alkaline phosphatase, malate dehydrogenase and the " 
like. Where not commercially available, such antibody-enzyme conjugates are readily 
produced by techniques known to those skilled in the art. The incubation time should be 
sufficent for the labeled ligand to bind -available molecules. Generally, from about 0.1 to 3 
hr is sufficient, usually 1 hr sufficing. 

After the second binding step, the insoluble support is again washed free of non- 
specfically bound materia,. The signal produced by the bound conjugate is detected by 
conventiona. means. Where an enzyme conjugate is used, an appropriate enzyme 
substrate is provided so a detectable product is formed. 

Other immunoassays are known in the art and may find use as diagnostics 
Ouchterlony plates provide a simple determination of antibody binding. Western blots 
may be performed on protein gels or protein spots on filters, using a detection system 
specfic for the polypeptide, conveniently using a labeling method as described for the 
sandwich assay. 
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Recombinant Mycobacterium 
Mycobacterium, particularly those of the M. tuberculosis complex, are genetically 
engineered to contain specific deletions or insertions corresponding to the identified 
genetic markers. In particular, attenuated BCG strains are modified to introduce deleted 
5 genes encoding sequences important in the establishment of effective immunity. 
Alternatively, M. bovis or M. tuberculosis are modified by homologous recombination to 
create specific deletions in sequences that determine virulence, ie. the bacteria are 
attenuated through recombinant techniques. 

In order to stably introduce sequences into BCG, the M. tb open reading frame 

10 corresponding to one of the deletions in Table 1. is inserted into a vector that is 
maintained in M. bovis strains. Preferably, the native 5' and 3* flanking sequences are 
included, in order to provide for suitable regulation of transcription and translation. 
However, in special circumstances, exogenous promoters and other regulatory regions 
may be included. Vectors and methods of transfection for BCG are known in the art. For 

15 example, U.S. Patent no. 5,776,465, herein incorporated by reference, describes the 
introduction of exogenous genes into BCG. 

In one embodiment of the invention, the complete deleted region is replaced in 
BCG. The junctions of the deletion are determined as compared to a wild type M. tb. or 
M. bovis sequence, for example as set forth in the experimental section. The deleted 

20 region is cloned by any convenient method, as known in the art, e.g. PCR amplification of 
the region, restriction endonuclease digestion, chemical synthesis, etc. Preferably the 
cloned region will further comprise flanking sequences of a length sufficient to induce 
homologous recombination, usually at least about 25 nt, more usually at least about 100 
nt, or greater. Suitable vectors and methods are known in the art, for an example, see 

25 Norman et a/. (1995) Mol. Microbiol. 16:755-760. 

In an alternative embodiment, one or more of the deletions provided in Table 1 
are introduced into a strain of M. tuberculosis or M. bovis. Preferably such a strain is 
reduced in virulence, e.g. H37Ra, etc. Methods of homologous recombination in order to 
effect deletions in mycobacteria are known in the art, for examples see Norman et ai, 

30 supra.\ Ganjam et al. (1991) P.N.A.S. 88:5433-5437; and Aldovini et at. (1993) i 
Bacterid . 175:7282-7289. Deletions may comprise an open reading frame identified in 
Table 1, or may extend to the full deletion, #.e. extending into flanking regions, and may 
include multiple open reading frames. 
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The ability of the genetica „ y al , ered mycobacter(um (o 
tested ,n one 0 r more experiment* mode,, For example, M. «,. is Known l0 infec, a 

r,eirr ma ' s ' and ceiis in cu * ure - ,n ° ne — 

P^erabcy human macrophages, are infected. ,n a comparison o, viruient, aviruien. and 
attenuated strains of the M . lubeKulosis ramplex a|veo|ar or ^ ^ ^ 

ri9^1 !! n ra, '° ^ * * <1998) !DfeSUm!!M 66 < 3 )^^"^: Pad, ef a/. 
1996 OJnfec^ 174(1,:105-„ 2 , The percentages o, caiis infected by the sttains and 

the ,n,,,a, numbers of intraceiluiar organisms are equivaient, as were ievels o, monocyte 
v,ab,l„y up to 7 days foiiowing infection. However, intraceiiuiar growth reflects virulence 
over a penod of one or more weeks. Mycobactena, growth may be evaiuated by ad* 
fast startng. electron microscopy, and colony-forming units (cfu) assay. Monocyte 
production of tumor necrosis factor alpha may also be monitored as a marker for 
virulence. 

Other assays for virulence utilize anima, models. The M. complex bacteria are 
able to mfect a wide variety of anima, hosts. One mode, of particular interest is cavitary 
uberculosls produced in rabbits by aerosolized virulent tubercle bacilii (Converse ef a/ 
(1996) tafeojamun 64 ( ,1):4776^787,. In „ qu efled caseum. the tubercle baci„i grow 
extraceflularly for the flrs, time since the onset o, the disease and can reach such large 
numbers that mutants Witt, antimicrobial resistance may deveiop. From a cavfly the 
bac, enter the bronchia, tree and spread to other parts o, me ,ung and a,so to Cher 
peop,e. Of the commonly used laboratory animals, the rabbit is the only one In which 
cavitary tuberculosis can be readily produced. 

Vaccinas may be formulated according to methods known in the art. Vaccines of 
Are modified bacteria are administered to a host which may be exposed to virulent 
tubercumsis. ,n many countries where tubercuiosis is endemic, vaccination may be 
performed a, birth, with additional vaccinations as necessary. The compounds of the 
present ,nven,ion are administered at a dosage that provides effective while minimizing 
any sKte-effects. I. is contended that the composition will be obtained and used under 
the guidance of a physician. 

Conventions vaccine strains of BCG may be formulated in a combination vaccine 
w.th polypeptides identified in the present invention and produced as previousiy 
described, in order to improve the efficacy of the vaccine. 

Various methods for administration may be employed. The formulation may be 
-njected intramuscularly, intravascu.arly, subcutaneously. etc. The dosage will be 
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conventional. The bacteria can be formulated into pharmaceutical compositions by 
combination with appropriate, pharmaceutically acceptable carriers or diluents, and may 
be formulated into preparations in semi-solid or liquid forms, such as solutions, injections, 
etc. The following methods and excipients are merely exemplary and are in no way 
5 limiting. 

The modified bacteria can be formulated into preparations for injections by 
dissolving, suspending or emulsifying them in an aqueous or nonaqueous solvent, such 
as vegetable or other similar oils, synthetic aliphatic acid glycerides, esters of higher 
aliphatic acids or propylene glycol; and if desired, with conventional additives such as 

10 solubilizers, isotonic agents, ; suspending agents, emulsifying agents,, stabilizers and 
preservatives. Unit dosage forms ' for injection or intravenous administration may 
comprise the bacteria of the present invention in a composition as a solution in sterile 
water, normal saline or another pharmaceutically acceptable carrier. 

The term "unit dosage form, 11 as used herein, refers to physically discrete units 

15 suitable as unitary dosages for human and animal subjects, each unit containing a 
predetermined quantity of vaccine, calculated in an amount sufficient to produce the 
desired effect in association with a pharmaceutically acceptable diluent, carrier or vehicle. 
The specifications for the unit dosage forms of the present invention depend on the 
particular bacteria employed and the effect to be achieved, and the pharmacodynamics 

20 associated with each complex in the host. 

The pharmaceutically acceptable excipients, such as vehicles, adjuvants, carriers 
or diluents, are readily available to the public. Moreover, pharmaceutically acceptable 
auxiliary substances, such as pH adjusting and buffering agents, tonicity adjusting 
agents, stabilizers, wetting agents and the like, are readily available to the public. 

25 

The following examples are put forth so as to provide those of ordinary skill in the 
art with a complete disclosure and description of how to make and use the subject 
invention, and are not intended to limit the scope of what is regarded as the invention. 
Efforts have been made to ensure accuracy with respect to the numbers used (e.g. 
30 amounts, temperature, concentrations, etc.) but some experimental errors and deviations 
should be allowed for. Unless otherwise indicated, parts are parts by weight, molecular 
weight is average molecular weight, temperature is in degrees centigrade; and pressure 
is at or near atmospheric. 



25 



WO 00/11214 



PCT/US99/17939 



Experimental 

Methods: 

The technical methods used begin with extraction of whole genomic DNA from bacteria 
grown in culture. 
Dav 1 

Inoculate culture medium of choice (LJ/7H9) and incubate at 35° C until abundant 
growth. Dispense 500 M | 1x TE into each tube. (If DNA is in liquid medium, no TE~ 
needed.) Transfer loopful (sediment) of cells into microcentrifuge tube containing 500 M I 
of 1*TE. If taking DNA from liquid medium, let cells collect in bottom of flask. Pipette cells 
(about 1ml) into tube. Heat 20 min at 80° C to kill cells, centrifuge, resuspend in 500 K I of 
1*TE. Add 50ul of 10 mg/ml lysozyme, vortex, incubate overnight at 37° C. 
Dav 2 

Add 70 M I of 10% SDS and 10 ^f proteinase K, vortex and incubate 20 min. at 65° 
C. Add 100m of 5M NaCI. Add 100^1 of CTAB/NaCI solution, prewarmed at 65° C 
Vortex until liquid content white ("milky-). Incubate 10 min at 65° C. Outside of hood 
prepare new microcentrifuge tubes labeled with culture # on top, and culture #, tube # 
date on side. Add 550 ul isopropanol to each and cap. Back in the hood, add 750 of 
chloroform/isoamy. alcohol, vortex for 10 sec. Centrifuge at room temp for 5 min. at 
12,000 g. Transfer aqueous supernatant in 180^ amounts to new tube using pipetter. 
being careful to leave behind solids and non-aqueous liquid. Place 30min at -20 C. Spin 
15 min at room temp in a microcentrifuge at 12,000g. Discard supernatant; leave about 
20^ above pellet. Add 1ml cold 70% ethanol and turn tube a few times upside down. 
Spin 5 min at room temp in a microcentrifuge. Discard supernatant; leave about 20ul 
above the pellet. Spin 1 min in a microcentrifuge and discard cautiously the last 20^1 
supernatant just above the pellet using a pipetter (P-20). Be sure that all traces of ethanol 
are removed. Allow pellet to dry at room temp for 10 min or speed vac 2-3 min. (Place 
open tubes in speed vac, close lid, start rotor, turn on vacuum. After 3 min. push red 
button, turn off vacuum, turn off rotor. Check if pellets are dry by flicking tube to see if 
pellet comes away from side of tube.) Redissolve the pellet in 20-50^1 of ddH20. Small 
pellets get 20, regular sized get 30 and very large get 50. DNA can be stored at 4° C for 
further use. 

DNA array: was made by spotting DNA fragments onto glass microscope slides 
which were pretreated with poly-L-lysine. Spotting onto the array was accomplished by a 
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robotic arrayer. The DNA was cross-linked to the glass by ultraviolet irradiation, and the 
free poly-L-lysine groups were blocked by treatment with 0.05% succinic anhydride, 50% 
1-methyl-2-pyrrolidinone and 50% borate buffer. 

The majority of spots on the array were PCR-derived products, produced by 
5 selecting over 9000 primer pairs designed to amplify the predicted open reading frames 
of the sequences strain H37Rv (ftp.sanger.ac.uk/pub/TB.seq). Some internal standards 
and negative control spots including plasmid vectors and non-M.tb. DNA were also on the 
array. 

Therefore, with the preparation for an array that contained the whole genome of 
10 Mycobacterium tuberculosis, we. compared BCG-Connaught to Mycobacterium 
tuberculosis, using the array for competitive hybridization. The protocol follows: 

DNA labeling protocol. Add 4 ng DNA in 20^1 H20, 2 ml dN10N6 and 36 \i\ H20. 
2 ml DNA spike for each DNA sample, for total of 60pJ. Boil 3 minutes to denature DNA, 

15 then snap cool on ice water bath. Add 1 jal dNTP (5mM ACG), 10 ^l 10 buffer, 4 \il 
Klenow, 22 pJ H20 to each tube. Add 3 til of Gy3 or Cy5 dUTP, for total of 100^1. 
Incubate 3 hours at 37C. Add 11^1 3M NaAc, 250 ^1 100% EtOH to precipitate, store 
O/N at -20C. Centrifuge genomic samples 30 minutes at 13K to pellet precipitate. 
Discard supernatant, add 70% EtOH, spin 15 minutes, discard sup and speed-vac to dry. 

20 This provides DNA for two experiments. 

DNA hybridization to microarray. protocol. Resuspend the labeled DNA in 11 pJ 
dH20 (for 2 arrays). Run out 1 jxl DNA on a 1.5% agarose gel to document sample to be 
hybridized. Of the remaining 10 ^l of solution, half will be used for this hyb, and half will 
be left for later date. Take 5^1 of solution Cy3 and add to same amount of Cy5 solution, 

25 for total volume 10 |J mixed labeled DNA. Add 1 jJ tRNA, 2.75 ^l 20x SSC, 0.4 jil SDS, 
for total volume 14.1 pi. Place on slide at array site, cover with 22mm coverslip, put slide 
glass over and squeeze onto rubber devices, then hybridize 4 hours at 65C. After 4 
hours, remove array slides from devices, leave coverslip on, and dip in slide tray into 
wash buffer consisting of 1x SSC with 0.05% SDS for about 2 minutes. Cover slip should 

30 fall off into bath. After 2 minutes in wash buffer, dip once into a bath with 0:06x SSC, 
then rinse again in 0.06x SSC in separate bath. Dry slides in centrifuge about 600 rpm. 
They are now ready for scanning. 
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Fluorescence scanning and data acquisition. Fluorescence scanning was set for 
20 microns/pixel and two readings were taken per pixel. Data for channel 1 was set to 
collect fluorescence from Cy3 with excitation at 520 nm and emission at 550-600 nm. 
Channel 2 collected signals excited at 647 nm and emitted at 660-705 nm. appropriate 
for Cy5. No neutral density filters were applied to the signal from either channel, and the 
photomultiplier tube gain was set to 5. Fine adjustments were then made to the 
photomultiplier gain so that signals collected from the two spots containing genomic DNA 
were equivalent. 

To analyze the signal from each spot on the array, a 14X14 grid of boxes was 
applied tp the data collected from tte 

integrated and a value was assigned to the corresponding spot. A background value was 
obtained for each spot by integrating the signals measured 2 pixels outside the perimeter 
of the corresponding box. The signal and background values for each spot were 
imported into a spreadsheet program for further analysis. The background values were 
subtracted from the signals and a factor of 1.025 was applied to each value in channel 2 
to normalize the data with respect to the signals from the genomic DNA spots. 

Because the two samples are labeled with different fluorescent dyes, it is possible 
to determine that a spot of DNA on the array has hybridized to Mycobacterium 
tuberculosis (green dye) and not to BCG (red dye), thus demonstrating a likely deletion 
from the BCG genome. 

However, because the array now contains spots representing 4000 spots, one 
may expect up to 100 spots with hybridization two standard deviations above or below 

a screening protocol, where we look for 
mismatched hybridization in two consecutive genes on the genome. Therefore, we are 
essentially looking only for deletions of multiple genes at this point. 

To confirm that a gene or group of genes is deleted, we perform Southern 
hybridization, employing a separate probe from the DNA on the array. Digestions of 
different mycobacterium DNAs are run on an agarose gel. and transferred to 
membranes. The membranes can be repeatedly used probing for different DNA 
sequences. For the purposes of this project, we include DNA from the reference strain of 
Mycobacterium tuberculosis (H37Rv). from other laboratory strains, such as H37Ra, the 
O strain, from clinical isolates, from the reference strain of Mycobacterium bovis, and 
from different strains of Mycobacterium bovis BCG. 
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Once a deletion is confirmed by Southern hybridization, we then set out to 
characterize the exact genomic location. This is done by using polymerase chain 
reaction, with primers designed to be close to the edges of the deletion, see Talbot 
(1997) J Clin Micro . 35: 566-9 

Primers have been chosen to amplify across the deleted region. Only in the 
absence of this region does one obtain an amplicon. PGR products were examined by 
electrophoresis (1.5% agarose) and ethidium bromide staining. 

Once a short amplicon is obtained, this amplicon is then sequenced. A search of 
the genome database is performed to determine whether the sequence is exactly 
identical to one part of the Mycobacterium tuberculosis genome, and that the next part of 
the amplicon is exactly identical to another part of the Mycobacterium tuberculosis 
genome. This permits precise identification of the site of deletion. 

Below follows an example of the kind of report obtained: 
rd6 bridging PCR, blast search of sequence 

emb|Z7970l|MTCY277 Mycobacterium tuberculosis cosmid Y277 
Length » 38,908 
Plus -Strand HSPs : 

Score = 643 (177.7 bits), Expect = 1.6e-54, Sum P{2) = 1.6e-54 
Identities = 129/131 (98%), Positives = 129/131 (98%), Strand = Plus / 
Plus 

Query : 12 ANTAGTAATGTGCGAGCTGAGCGATGTCGCCGCTCCC^UUVAATTACCAATGGTTNGGTCA 

71 

I § I I I 1 I I I I I I I 1 t I f 1 I 1 I i I I t I I I t I 1 I I I I f 1 t I 1 I I I I I 1 I I I I I I I INI! 
Sbjct : 24784 AGTAGTAATGTGCGAGCTGAGCGATGTCGCCGCTCCCAAAAATTACCAATGGTTTGGTCA 

Que ry : 72 TGACGCCTTCCTAACCAGAATTGTGAATTCATACAAGCCGTAGTCGTGCAGAAGCGCAAC 

! M 1 1 [ 1 M 1 1 1 M 1 1 1 M 1 1 1 M I M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 ! ! E 1 1 1 1 1 1 1 1 i II 1 1 1 

Sb j ct : 24844 TGACGCCTTCCTAACCAGAATTGTGAATTCATACAAGCCGTAGTCGTGCAGAAGCGCAAC 
Query: 132 ACTCTTGGAGT 142 

1 1 III II I II I 

Sbjct: 24904 ACTCTTGGAGT 249X4 
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Score = 224 (61 . 9 bitsK = 1;6e . 54/ pw = 

Identities = 46/49 (93%) , Positives = 46/49 (93%)/ strand m pius ^ 
Query: 141 GTGGCCTACAACGGNGCTCTCCGNGGCGCGGGCGTACCGGATATCTTAG 189 

' llllllilllll IIIIIIM lllllllllllliniiiMi,,,, 

Sb 3 ct: 37645 GCGGCCTACAACGGCGCTCTCCGCGGCGCGGGCGTACCGGATATGTTAG 37693 

This process is repeated with each suggested deletion, beginning with the three 
previously described de.etions to serve as controls. Sixteen de,etions have been 
identified by these methods, and are listed in Table 1 

10 

: ^, It is to be Understood that this, invention ii not limited to the particular 
methodology, protocols, formulations and reagents described, as such may of course 
vary. «t ,s also to be understood that the terminology used herein is for the purpose of 
describing particular embodiments only, and is not intended to limit the scope of the 
15 present invention which will be limited only by the appended claims. 

It must be noted that as used herein and in the appended claims, the singular 
forms "a", "and", and "the" include p.ura. referents unless the context dearly dictates 
otherw.se. Thus, for example, reference to "a complex" includes a plurality of such 
complexes and reference to "the formulation" includes reference to one or more 
formulates and equivalents thereof known to those skilled in the art, and so forth 

Unless defined otherwise, all technical and scientific terms used herein have the 
same meaning as commonly understood to one of ordinary skill in the art to which this 
■nvention belongs. Although any methods, devices and materials similar or equivalent to 
those described herein can be used in the practice or testing of the invention, the 
25 preferred methods, devices and materials are now described. 

All publications mentioned herein are incorporated herein by reference for the 
purpose of describing and disclosing, for example, the cell lines, constructs, and 
methodologies that are described in the publications which might be used in connection 
with the presently described invention. The publications discussed above and throughout 
the text are provided solely for their disclosure prior to the filing date of the present 
apphcation. Nothing herein is to be construed as an admission that the inventors are not 
entitled to antedate such disclosure by virtue of prior invention. 
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What is Claimed is: 

1. A nucleic acid comprising a junction of a deletion marker in Table 1. 

2. The nucleic acid of Claim 1, wherein said nucleic acid hybridizes to a M. 
tuberculosis complex genome when the deletion is present, but not in an undeleted 
genome. 

3. The nucleic acid of Claim 1, wherein said nucleic acid is from 15 to 25 
nucleotides in length. 

4. The nucleic acid of Claim 1 , wherein said M. tuberculosis complex genome 
is BCG. 

5. The nucleic acid of Claim 1 , wherein said M. tuberculosis complex genome 
is a variant of M. tuberculosis. 

6. The nucleic acid of Claim 1 t wherein said M. tuberculosis complex genome 
is M. bovis. 

7. A pair of hybridization primers comprising the nucleic acid of Claim 1 B and 
a second nucleic acid that hybridizes to a second site in an M. tuberculosis complex 
genome. 

8. A genetically altered mycobacterium, comprising an exogenous nucleic 
acid sequence comprising one or more deletion markers as set forth in Table 1 . 

9. The genetically altered mycobacterium of Claim 8, wherein said 
mycobacterium is BCG, and wherein said deletion marker is deleted in BCG according to 
Table 1. 

10. The mycobacterium of Claim 8, further comprising a physiologically 
acceptable carrier for injection. 
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11. A genetically altered mycobacteria, comprising a de.etion resulting from 
homologous recombination in a deletion marker as set forth in Table 1 . 

12. The genetically altered mycobacterium according to Claim 11, wherein 
said mycobacterium is M. bovis. 

13. The mycobacterium of Claim 12, further comprising a physiologically 
acceptable carrier for injection. 

,, v.S, ,14,;. The genetically altered mycobacterium according to Claim 11, wherein - 
said mycobacterium is M. tuberculosis. 

15. The mycobacterium of Claim 14, further comprising a physiologically 
acceptable carrier for injection. 

16. A method of distinguishing whether a patient has been exposed to BCG or 
to M. tuberculosis, the method comprising: 

contacting said patient or a sample derived therefrom with a polypeptide encoded 
by a deletion marker of Table 1. wherein said deletion marker is present in M 
tuberculosis and absent in BCG; and 

determining the presence of an immune reaction to said polypeptide, wherein a 
positive response is indicative of exposure to M. tuberculosis. 

17. The method of Claim 16, wherein said contacting step comprises 
sub-cutaneous injection of said polypeptide. 

18. The method of Claim 16, wherein said contacting step is performed in vitro 
and said sample comprises a blood sample or derivative thereof. 

19- A method of distinguishing a bacterial strain of the M. tuberculosis 
complex, the method comprising: 

determining the presence of a deletion marker in Table 1, wherein said deletion is 
absent in at least one of said candidate strains; 
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wherein the presence of said deletion marker is indicative that said strain is not 
said candidate strain. 

20. The method according to Claim 19, wherein said determining step 
5 comprises nucleic acid hybridization to said deletion marker. 

21. The method according to Claim 19, wherein said determining step 
comprises antibody binding to a polypeptide encoded by said deletion marker. 

10 22. Tlie method according to Claim 19, wherein said determining step 

comprises PGR amplification across said deletion. 

23. The method according to Claim 19, wherein said determining step 
comprises hybridization to a junction sequence associated with said deletion marker. 
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AMENDED CLAIMS 

[received by the International Bureau on 25 February 2000 (25.02.00); 
new claims 24-33 added; remaining claims unchanged ( 1 page)] 

24. A polypeptide encoded by an open reading frame as identified in Table 1 , 
or a fragment thereof. 

25. A fusion polypeptide, comprising the polypeptide of Claim 24, and a fusion 
5 partner. 

26. A vaccine comprising at least one polypeptide or a fragment thereof 
according to claim 24. 

10 27. a vaccine comprising at least one fusion polypeptide or a fragment thereof 

according to claim 25. 

28. A vaccine comprising a 8CG strain and at least one polypeptide or 
fragment thereof according to claim 24. 

15 

29. A vaccine comprising a BCG strain and at least one fusion polypeptide or 
a fragment thereof according to claim 25. 

30. An antibody that binds specifically to a polypeptide or a fragment thereof 
20 according to claim 24. 

31 . A monoclonal antibody according to claim 30. 

32. A method of producing a polypeptide or a fragment thereof according to 
25 claim 24, the method comprising: 

inserting an expression cassette that is functional in a prokaryotic or eukaryotic 

expression host, 

expressing the polypeptide in the expression host used 
recovering expressed polypeptide from the expression host; and 
30 Isolating the polypeptide. 

33. A method of producing a polypeptide or a fragment thereof according to 
claim 24. the method comprising: 

synthesizing the polypeptide by peptide synthesis. 
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